A generalization of the PST algorithm: modeling the sparse nature of protein sequences

نویسنده

  • Florencia G. Leonardi
چکیده

MOTIVATION A central problem in genomics is to determine the function of a protein using the information contained in its amino acid sequence. Variable length Markov chains (VLMC) are a promising class of models that can effectively classify proteins into families and they can be estimated in linear time and space. RESULTS We introduce a new algorithm, called Sparse Probabilistic Suffix Trees (SPST), that identifies equivalence between the contexts of a VLMC. We show that, in many cases, the identification of these equivalence can improve the classification rate of the classical Probabilistic Suffix Trees (PST) algorithm. We also show that better classification can be achieved by identifying representative fingerprints in the amino acid chains, and this variation in the SPST algorithm is called F-SPST.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A generalization of Profile Hidden Markov Model (PHMM) using one-by-one dependency between sequences

The Profile Hidden Markov Model (PHMM) can be poor at capturing dependency between observations because of the statistical assumptions it makes. To overcome this limitation, the dependency between residues in a multiple sequence alignment (MSA) which is the representative of a PHMM can be combined with the PHMM. Based on the fact that sequences appearing in the final MSA are written based on th...

متن کامل

Modeling of measurement error in refractive index determination of fuel cell using neural network and genetic algorithm

Abstract: In this paper, a method for determination of refractive index in membrane of fuel cell on basis of three-longitudinal-mode laser heterodyne interferometer is presented. The optical path difference between the target and reference paths is fixed and phase shift is then calculated in terms of refractive index shift. The measurement accuracy of this system is limited by nonlinearity erro...

متن کامل

Comparing the Bidirectional Baum-Welch Algorithm and the Baum-Welch Algorithm on Regular Lattice

A profile hidden Markov model (PHMM) is widely used in assigning protein sequences to protein families. In this model, the hidden states only depend on the previous hidden state and observations are independent given hidden states. In other words, in the PHMM, only the information of the left side of a hidden state is considered. However, it makes sense that considering the information of the b...

متن کامل

A Novel Image Denoising Method Based on Incoherent Dictionary Learning and Domain Adaptation Technique

In this paper, a new method for image denoising based on incoherent dictionary learning and domain transfer technique is proposed. The idea of using sparse representation concept is one of the most interesting areas for researchers. The goal of sparse coding is to approximately model the input data as a weighted linear combination of a small number of basis vectors. Two characteristics should b...

متن کامل

Analysis and Professional Designing of COBRA (Computationally Optimized Broadly Reactive Antigen) Vaccine for Bm86 midgut Protein of R. microplus and R. annulatus Ticks

Introduction: The cattle tick Rhipicephalus spp. causes significant economic losses due to diseases in animals and human. Bm86 is a midgut protein and vaccine candidate, which its sequences among the isolates of Ripsephalus spp are geographically separated, variable, and are the main reason for reducing effectiveness, and subsequently, the failure of the recombinant vaccines. Method: In this bi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 22 11  شماره 

صفحات  -

تاریخ انتشار 2006